test #2

harveenchadha · 2022-05-20T09:08:38Z

test

Summary: # Before submitting - [X] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [X] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Fixes #3882 Fixes #3884 ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: #3887 Reviewed By: yuntang Differential Revision: D33152073 Pulled By: kahne fbshipit-source-id: 7f5c90a9876320e7c5c406ed032681452c7c5056

Summary: add METEOR scorer; fix chrF scorer config Reviewed By: hygong-fb Differential Revision: D33273312 fbshipit-source-id: 3fcb5b2479fb6cc90e9f0235886c658e0c586fba

Summary: update ignore_prefix_size in label_smoothed_cross_entropy - lprobs is always B x T x C in the current models - lprobs.batch_first was default to `False` which contradicts the fact above Reviewed By: sravyapopuri388 Differential Revision: D33304121 fbshipit-source-id: 9391b48c7036642d9741d254b03c46389a4fe584

Summary: fix evaluation tokenizer for sacrebleu >= 2.0.0 Reviewed By: sravyapopuri388 Differential Revision: D33306119 fbshipit-source-id: c0d0d45df201de7a869aae1680b7ae49b590414a

Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Releasing code, model & recipe for the work "Direct speech-to-speech translation with discrete units". Main changes: 1. examples/speech_to_speech 2. tasks/speech_to_speech 3. data/audio/speech_to_speech_dataset 4. models/speech_to_speech 5. criterions/speech_to_speech_criterion ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: fairinternal/fairseq-py#2756 Reviewed By: sravyapopuri388, kahne Differential Revision: D32923969 Pulled By: an918tw fbshipit-source-id: 838ba42457f4684e9767d15b5b514681a9572b39

Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Applied `black` and `isort` to fix failing CI ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: fairinternal/fairseq-py#2834 Reviewed By: vedanuj Differential Revision: D33262876 Pulled By: dianaml0 fbshipit-source-id: 03215c276fcddda9f7c78971bf6ed7c5ac21b2ee

Summary: [Fairseq] Add regularization for multihead attention module and ffn module Reviewed By: dianaml0 Differential Revision: D32441521 fbshipit-source-id: c648c1f8ec1a3310ba90c4952cdd40a21b959d26

…_model() Summary: Add strict option to checkpoint_utils. load_pretrained_component_from_model() Reviewed By: sravyapopuri388 Differential Revision: D33304224 fbshipit-source-id: 2284a21dfea7810ec212f15daadeeeb45c6dca1b

Summary: Update xm_transformer - Added V1 arch (FFNs before/after convolutions in the adaptor, which didn't exist in the V0/ACL paper arch) - Added args for gradient checkpointing and fully sharded data parallele Reviewed By: sravyapopuri388 Differential Revision: D33144404 fbshipit-source-id: 548c917824ebd2aa926c83d5ba62fbf648cf4b97

Summary: fix SacrebleuScorer.score() Reviewed By: sravyapopuri388 Differential Revision: D33311843 fbshipit-source-id: 8536baceab6ef2e7c9c4a9a8a005abaa6a9229f0

Summary: add xm_transformer test; refactor speech tests Reviewed By: sravyapopuri388 Differential Revision: D33312231 fbshipit-source-id: a2b2695fc3c10d5420abbe23a4a3005777aa2ae1

Summary: add hub interface for TTS Reviewed By: pipibjc Differential Revision: D33394399 fbshipit-source-id: 4efb5b08cf04ef77a469006f9822e22a27112ac6

Summary: add hub interface for S2T Reviewed By: sravyapopuri388 Differential Revision: D33394412 fbshipit-source-id: bf844822261c213bafacd9b2c71d9d591bc0f3a6

Summary: # Before submitting - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests? ## What does this PR do? Applies `black` and `isort` to files ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: fairinternal/fairseq-py#2860 Reviewed By: Mortimerp9 Differential Revision: D33456637 Pulled By: dianaml0 fbshipit-source-id: 560b8d3a8f589cbecc92d0d21163596b5d47d609

Summary: Add test for DualInputS2TTransformerModel at examples/speech_text_joint_to_text/models/s2t_dualinputtransformer.py Reviewed By: kahne Differential Revision: D33284188 fbshipit-source-id: c02b697fc7734425661e00bbb606852b5d94a587

Summary: **This PR** - Adds conformer layer based on https://arxiv.org/pdf/2005.08100.pdf. - Conformer implementation supports multihead attention based on 3 different positional embedding types - absolute positional embedding, relative positional encoding and rotational positional embedding. - Adds conformer encoder with conv1d subsampling, positional embedding followed by N conformer layers - Adds S2T_Conformer model based on the conformer encoder and transformer decoder. - Add conformer support in Wav2Vec2 - Add unit tests for core modules **Verfication** - Verified the set up on MUST-C En-De S2T, Covost2 Es-En S2T, Librispeech ASR to ensure the implementation is correct. - For S2T setups, the performance is either similar to the transformer based models or better. - Wav2vec2 pretraining and finetuning based on librispeech showed improvements over corresponding transformer baselines. - [WIP] Experiment log: https://docs.google.com/document/d/1QI-ROWVenUEXPJoHTaKD85Fq7T8ZXNc8bc54MzgwJjA/edit# **Next steps** - Add regression tests - Add README and open source checkpoints Pull Request resolved: fairinternal/fairseq-py#2859 Reviewed By: kahne Differential Revision: D33434092 Pulled By: sravyapopuri388 fbshipit-source-id: 62f22b917a332481370750e04a439e05832a2282

Summary: - The goal of this framework is to support benchmarking various speech to speech translation(S2ST) models in terms of runtime, max-memory consumption and total number of floating point operations(FLOPS). - It is a generic framework and can be easily extended to support any fairseq models. To accurately benchmark the performance, core inference modules are re-implemented based on fairseq_cli/generate.py (core.py/Processing) and examples/speech_to_text/generate_waveform.py(core.py/SpeechGeneration. - To ensure that the end to end models and cascaded models are compared fairly, for cascaded models we only consider the performance metrics for model inference at all stages ignoring any intermediate data and io processing consumption. - We run all the benchmarking runs on CPU as it is generally used in production environment and also due to lack of good benchmarking library support for GPUs. Pull Request resolved: fairinternal/fairseq-py#2852 Reviewed By: an918tw Differential Revision: D33398060 Pulled By: sravyapopuri388 fbshipit-source-id: cffa19820deaa4ee7f629845944cbb6223498f4d

Summary: Support multihead attention prune for Fairseq. For example, user can apply pruning on top of Roberta base model by specify the argument "--mha-heads-to-keep 8". Also, user needs to provide a ckpt which is already pruned so that the pruned ckpt can be loaded correctly. The idea of prune can be summarized as 1. Fine tune model (e.g. roberta encoder) on a certain datasets with regularization 2. After the model is trained. User could use get_reserve_head_index and _adaptive_prune_heads functions to get the top X heads with most importance. Then user uses the rank to prune a new roberta encoder and save the pruned ckpt manually. 3. User will fine tune the the new roberta encoder via the ckpt saved above To get rid of registering different pruned version of Roberta, I use the argument --mha-heads-to-keep to prune the Roberta model into a pruned version which matches the pruned ckpt. Reviewed By: dianaml0 Differential Revision: D32449003 fbshipit-source-id: a952fd9ad723a6dbc5c2af574c42f2e9a1fa27dc

Summary: This is the equivalent to PR fairinternal/fairseq-py#2697 but on top of main instead of gshard (cherry-picked and merged the squash): * reorganize preprocess.py code a bit * use Binarizers objects in the multiprocess code * clean up the make_binary * multiprocess logic * learn to count * format and doc string * add basic test for vocab binarizer * generalize to one line * move multiprocess in binarizer Testing: ``` python -m fairseq_cli.preprocess --only-source --trainpref ~/fixathon/small_vocab_test/train.in --destdir ~/fixathon/small_vocab_test/data-bin.cherry --workers 20 python -m fairseq_cli.preprocess --only-source --trainpref ~/fixathon/small_vocab_test/train.in --destdir ~/fixathon/small_vocab_test/data-bin.main --workers 20 ``` ``` md5sum ~/fixathon/small_vocab_test/data-bin.cherry/train.bin == md5sum ~/fixathon/small_vocab_test/data-bin.main/train.bin ``` ``` diff ~/fixathon/small_vocab_test/data-bin.main/dict.txt ~/fixathon/small_vocab_test/data-bin.cherry/dict.tx ``` Pull Request resolved: fairinternal/fairseq-py#2738 Reviewed By: sshleifer, dianaml0 Differential Revision: D32830875 Pulled By: Mortimerp9 fbshipit-source-id: e7463d5cdd96a877691bf39666daa319ebb3dcb8

… paper (#4129) Summary: # Before submitting - [ x ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [ x ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)? - [ x ] Did you make sure to update the docs? - [ x ] Did you write any new necessary tests? ## What does this PR do? Update commands, checkpoints and contact info. ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: #4129 Reviewed By: dianaml0 Differential Revision: D33556233 Pulled By: shruti-bh fbshipit-source-id: 3bad45b3e154fa11d4b13776d97408ce1a166113

Summary: As title Reviewed By: nayansinghal Differential Revision: D32005717 fbshipit-source-id: ebdf1ed0e4a2b9fccffd841d0fa7be0b50ec6b79

Summary: Support FFN prune for Fairseq. For example, user can apply pruning on top of Roberta base model by specify the argument "--ffn-blocks-to-remove 1024". Also, user needs to provide a ckpt which is already pruned so that the pruned ckpt can be loaded correctly. The idea of prune can be summarized as Fine tune model (e.g. roberta encoder) on a certain datasets with regularization After the model is trained. User could use _get_fc_rank and _prune_fc_layer functions to get the top X blocks with most importance in each transformer layer. Then user uses the rank to prune a new roberta encoder and save the pruned ckpt manually. User will fine tune the the new roberta encoder via the ckpt saved above Reviewed By: dianaml0 Differential Revision: D33525055 fbshipit-source-id: 5087140ee891d6ec9266726e3a477947c233412c

Summary: Add scripts for multihead attention selection in multilingual and multil-domain training from the following paper: "Pay Better Attention to Attention: Head Selection in Multilingual and Multi-Domain Sequence Modeling", NeurIPS 2021. Reviewed By: yuntang Differential Revision: D31781212 fbshipit-source-id: 8e1a596826f682f80730c251ec31c68df0de6516

Summary: Add option to use the EMA model for decoding in transducer IPL recipe by passing --ipl-decode-ema. Note EMA should be enabled as in the diff D24238379 (8feccf9) using options --store-ema --ema-start-update and --ema-decay. Reviewed By: cruvadom Differential Revision: D31983366 fbshipit-source-id: 2bf63b3f7d1b5fa8804b3a7e9bfab71a463ca957

Summary: Add scripts for multihead attention selection in multilingual and multil-domain training from the following paper: "Pay Better Attention to Attention: Head Selection in Multilingual and Multi-Domain Sequence Modeling", NeurIPS 2021. Reviewed By: yuntang Differential Revision: D31802221 fbshipit-source-id: 8c69b89bda29e6857bd3af02979c07e1b5cf49f1

Summary: Preliminaries for data2vec release, include some minor improvements and bug fixes Most important change is that we now default to raising an exception when fields in config do not have a corresponding field in the model dataclass Pull Request resolved: fairinternal/fairseq-py#2929 Reviewed By: wnhsu Differential Revision: D33649708 Pulled By: alexeib fbshipit-source-id: 629bdb4c361550740b451c570c2005bb956c6fcb

Summary: new data2vec models Pull Request resolved: fairinternal/fairseq-py#2936 Reviewed By: jacobkahn Differential Revision: D33674643 Pulled By: alexeib fbshipit-source-id: 2c2b4fae541974587b50a78a44d34033e9b5192d

Summary: minor fix Pull Request resolved: fairinternal/fairseq-py#2939 Reviewed By: michaelauli Differential Revision: D33685330 Pulled By: alexeib fbshipit-source-id: 4d6c6edb1fab9d0d56a6e03c0a2b43a864f1d07a

Summary: https://www.internalfb.com/diff/D33649708 (995c204337d16a6146a433cee360e5a5bfbc9a6f)?src_version_fbid=1030479880843010&dst_version_fbid=247617347518523&transaction_fbid=1601081576900014 Reviewed By: alexeib Differential Revision: D33696937 fbshipit-source-id: 9a17610e3f4eb3dd2b2131a3f9fb42732a31b47f

Summary: 1. Add XGLM downstream task evaluation examples 2. Add bibtex citation of XGLM arXiv paper Pull Request resolved: #4154 Reviewed By: xianxl Differential Revision: D33748846 Pulled By: todpole3 fbshipit-source-id: ce4dfce2fccf92742f124f12a0d9a388280320fa

Co-authored-by: Andros Tjandra <[email protected]>

* Create MMS_ASR_Inference_Colab.ipynb Added tutorial in Google Colab IPYNB fashion with small modification. Credit to epk2112 https://github.com/epk2112/fairseq_meta_mms_Google_Colab_implementation * Add readme & ipynb * Add readme & ipynb * change colab hyperlink --------- Co-authored-by: Andros Tjandra <[email protected]>

* [MMS] Create Colab Notebook for LID task * Update README.md * Update README.md

* Update MMS_ASR_Inference_Colab.ipynb * Update mms_infer.py

* Add TTS Colab notebook --------- Co-authored-by: Bowen Shi <[email protected]>

* Fix ț filtering in Romanian at inference * mps support + full checkpoints (discriminator+optimizer) --------- Co-authored-by: Bowen Shi <[email protected]>

* fix missing extra args in ConformerLayer * fix extra args issue --------- Co-authored-by: Andros Tjandra <[email protected]>

* Add transformers MMS checkpoints to docs * Apply suggestions from code review * Apply suggestions from code review * Update examples/mms/README.md * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <[email protected]> * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <[email protected]> --------- Co-authored-by: Sanchit Gandhi <[email protected]>

Co-authored-by: Andros Tjandra <[email protected]>

…lEmbedding` (#5213)

…for hubert bf16 models (#5285) * add conv_batch_norm for hubert to support bf16 * linting Co-authored-by: Bowen Shi <[email protected]>

Fix MMS alignment code

Co-authored-by: Junteng Jia <[email protected]>

* Mention sox install through apt, on top of the Python wrapper * Fix argument name in example command

* multires hubert core * update core codebase on multiresolution hubert * add examples * adding entries to pretrained models (not finished) * add other abalation models * add multilinugal * add decode.sh train.sh finetune.sh and update links for README.md * fix readme * clean the codebase --------- Co-authored-by: Anna Sun <[email protected]>

MMS Zero-shot release

* init lid rerank * init lid rerank * add greedy ctc score

gegallego and others added 30 commits December 21, 2021 18:18

add METEOR scorer; fix chrF scorer config

ca36b43

Summary: add METEOR scorer; fix chrF scorer config Reviewed By: hygong-fb Differential Revision: D33273312 fbshipit-source-id: 3fcb5b2479fb6cc90e9f0235886c658e0c586fba

fix evaluation tokenizer for sacrebleu >= 2.0.0

87d0ede

Summary: fix evaluation tokenizer for sacrebleu >= 2.0.0 Reviewed By: sravyapopuri388 Differential Revision: D33306119 fbshipit-source-id: c0d0d45df201de7a869aae1680b7ae49b590414a

Add regularization for multihead attention module and ffn module

2762a1c

Summary: [Fairseq] Add regularization for multihead attention module and ffn module Reviewed By: dianaml0 Differential Revision: D32441521 fbshipit-source-id: c648c1f8ec1a3310ba90c4952cdd40a21b959d26

fix SacrebleuScorer.score()

59b3ada

Summary: fix SacrebleuScorer.score() Reviewed By: sravyapopuri388 Differential Revision: D33311843 fbshipit-source-id: 8536baceab6ef2e7c9c4a9a8a005abaa6a9229f0

add xm_transformer test; refactor speech tests

ee177fc

Summary: add xm_transformer test; refactor speech tests Reviewed By: sravyapopuri388 Differential Revision: D33312231 fbshipit-source-id: a2b2695fc3c10d5420abbe23a4a3005777aa2ae1

add hub interface for TTS

43defa1

Summary: add hub interface for TTS Reviewed By: pipibjc Differential Revision: D33394399 fbshipit-source-id: 4efb5b08cf04ef77a469006f9822e22a27112ac6

add hub interface for S2T

1d5da6d

Summary: add hub interface for S2T Reviewed By: sravyapopuri388 Differential Revision: D33394412 fbshipit-source-id: bf844822261c213bafacd9b2c71d9d591bc0f3a6

Add unittests for jitting EMA model

cf8ff8c

Summary: As title Reviewed By: nayansinghal Differential Revision: D32005717 fbshipit-source-id: ebdf1ed0e4a2b9fccffd841d0fa7be0b50ec6b79

Data2vec (#2936)

c71870f

Summary: new data2vec models Pull Request resolved: fairinternal/fairseq-py#2936 Reviewed By: jacobkahn Differential Revision: D33674643 Pulled By: alexeib fbshipit-source-id: 2c2b4fae541974587b50a78a44d34033e9b5192d

fix readme (#2939)

fc758bb

Summary: minor fix Pull Request resolved: fairinternal/fairseq-py#2939 Reviewed By: michaelauli Differential Revision: D33685330 Pulled By: alexeib fbshipit-source-id: 4d6c6edb1fab9d0d56a6e03c0a2b43a864f1d07a

kirill-fedyanin and others added 30 commits May 24, 2023 06:09

Fix the MMS doc about LID manifest (#5144)

b50b649

Fix wrong input-output ASR input utts order (#5149)

25c20e6

Co-authored-by: Andros Tjandra <[email protected]>

[MMS] Create Colab notebook for LID inference (#5157)

68c52f5

* [MMS] Create Colab Notebook for LID task * Update README.md * Update README.md

[MMS] Add a tutorial on CC LM decoding for ASR model (#5160)

35641fb

* Update MMS_ASR_Inference_Colab.ipynb * Update mms_infer.py

Feature: Implement UTF-8 readfile support in MMS for TTS example (#5148)

b309803

MMS TTS Colab Notebook (#5165)

ae59bd6

* Add TTS Colab notebook --------- Co-authored-by: Bowen Shi <[email protected]>

MMS TTS Romanian char fix + MPS support + full checkpoint (#5168)

533644c

* Fix ț filtering in Romanian at inference * mps support + full checkpoints (discriminator+optimizer) --------- Co-authored-by: Bowen Shi <[email protected]>

fix missing extra args in ConformerLayer (#5176)

456ffcf

* fix missing extra args in ConformerLayer * fix extra args issue --------- Co-authored-by: Andros Tjandra <[email protected]>

Update MMS README.md (#5202)

91c364b

add new instructions on how to get manifest *.tsv file (#5207)

8deb43a

Co-authored-by: Andros Tjandra <[email protected]>

Update README.md (#5211)

a29952c

Register weights as a non-persistent buffer of `SinusoidalPositiona…

31fba01

…lEmbedding` (#5213)

Make RotaryPositionalEmbedding jit-compatible (#5237)

100cd91

Add batchnorm option to hubert/wav2vec2 positional convolution layer …

4db2649

…for hubert bf16 models (#5285) * add conv_batch_norm for hubert to support bf16 * linting Co-authored-by: Bowen Shi <[email protected]>

Update align_and_segment.py (#5317)

b5d89cd

Fix MMS alignment code

initial revision (#5328)

e29f53b

Keep task level checkpoint key name generic (#5330)

7409af7

fix iterator when loading from checkpoint (#5344)

c7c478b

Co-authored-by: Junteng Jia <[email protected]>

Change Meta AI to FAIR (#5346)

da8fb63

Update README.md (#5407)

fad2c4d

MMS alignment README fixes (#5432)

3f0f20f

* Mention sox install through apt, on top of the Python wrapper * Fix argument name in example command

Delete .circleci directory (#5458)

bedb259

Create depreview.yml (#5501)

d9a6270

Create README.md (#5529)

920a548

MMS Zero-shot release

Add LID rerank for MMS (#5545)

c214511

* init lid rerank * init lid rerank * add greedy ctc score

update paper link + bug fix (#5547)

018621f

Add an auto_expand option to SinusoidalPositionalEmbedding (#5555)

ecbf110

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test #2

test #2

harveenchadha commented May 20, 2022

test #2

Are you sure you want to change the base?

test #2

Conversation

harveenchadha commented May 20, 2022